Web mining with relational clustering

نویسندگان

  • Thomas A. Runkler
  • James C. Bezdek
چکیده

Clustering is an unsupervised learning method that determines partitions and (possibly) prototypes from pattern sets. Sets of numerical patterns can be clustered by alternating optimization (AO) of clustering objective functions or by alternating cluster estimation (ACE). Sets of non–numerical patterns can often be represented numerically by (pairwise) relations. These relational data sets can be clustered by relational AO and by relational ACE (RACE). We consider two kinds of non– numerical patterns provided by the World Wide Web: document contents such as the text parts of web pages, and sequences of web pages visited by particular users, so–called web logs. The analysis of document contents is often called web content mining, and the analysis of log files with web page sequences is called web log mining. For both non–numerical pattern types (text and web page sequences) relational data sets can be automatically generated using the Levenshtein (edit) distance or using graph distances. The prototypes found for text data can be interpreted as keywords that serve for document classification and automatic archiving. The prototypes found for web page sequences can be interpreted as prototypical click streams that indicate typical user interests, and therefore serve as a basis for web content and web structure management.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-complexity fuzzy relational clustering algorithms for Web mining

This paper presents new algorithms (Fuzzy c-Medoids or FCMdd and Robust Fuzzy c-Medoids or RFCMdd) for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known Relational Fuzzy c-Means algorit...

متن کامل

Orthogonal Nonnegative Matrix Factorization for Multi-type Relational Clustering

Relational clustering with heterogeneous data objects has impact in various important applications, such as web mining, text mining and bioinformatics etc. In this paper, we build a star-structured general model for relational clustering. It is formulated as an orthogonal tri-nonnegative matrix factorization. The model performs matrix approximation among all different data types to look for hid...

متن کامل

Relational fuzzy approach for mining user profiles

Capturing the characteristics and preferences of Web users into user profiles is a fundamental task to perform in order to implement forms of personalization on a Web site. In this paper, we present a relational fuzzy clustering approach to extract significant user profiles from session data derived from log files. In particular, a modified version of the CARD clustering algorithm is proposed i...

متن کامل

Knowledge Discovery meets Linked APIs

Knowledge Discovery and Data Mining (KDD) is a very wellestablished research eld with useful techniques that explore patterns and regularities in large relational, structured and unstructured datasets. Theoretical and practical development in this eld have led to useful and scalable solutions for the tasks of pattern mining, clustering, graph mining, and predictions. In this paper, we demonstra...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. J. Approx. Reasoning

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2003